-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable "kick the tires" support for Nvidia GPUs in COS #45136
Enable "kick the tires" support for Nvidia GPUs in COS #45136
Conversation
b880ab3
to
bfa122d
Compare
5eddeb4
to
2bc0629
Compare
bc47260
to
86c5ae4
Compare
795840d
to
35a6810
Compare
4daf007
to
6c4372d
Compare
@@ -1593,4 +1593,5 @@ else | |||
fi | |||
reset-motd | |||
prepare-mounter-rootfs | |||
modprobe configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For IKCONFIG? Note it is not listed in the GKE node image spec https://docs.google.com/document/d/1qmiJOuLYqjJZF-PTfn-xvTbLvzTgFw8gMcWavX7qiQ0/edit
So we may have to double check our images to ensure they are built with this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. How are folks expected to discover the doc you posted? Can you add an entry for /proc/configz
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Nvidia driver installation may not require configs on all distros. I'm not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchen1107 is the person to talk to about changing the node spec
ENV DEBIAN_FRONTEND noninteractive | ||
|
||
RUN apt-get -qq update | ||
RUN apt-get install -qq -y pciutils gcc g++ git make dpkg-dev bc module-init-tools curl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe -qq implies -y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sweet.
NVIDIA_DRIVER_PKG_NAME="NVIDIA-Linux-x86_64-375.26.run" | ||
|
||
check_nvidia_device() { | ||
lspci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth grepping for NVIDIA devices here too, for less output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can optimize it by wrapping around a command
, but I actually like verbose output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
|
||
echo "Running the Nvidia driver installer ..." | ||
if ! sh "${NVIDIA_DRIVER_PKG_NAME}" --kernel-source-path="${KERNEL_SRC_DIR}" --silent --accept-license --keep --log-file-name="${log_file_name}"; then | ||
echo "Nvidia installer failed, log below:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also print where people can find the full log file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the command run by this script will be printed - set -x
. So the tail
command below will display the log file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
test/e2e/generated/BUILD
Outdated
@@ -23,7 +23,7 @@ genrule( | |||
name = "bindata", | |||
srcs = [ | |||
"//examples:sources", | |||
"//test/images:sources", | |||
"//cluster/gce/gci/nvidia-gpus:sources//test/images:sources", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you might have accidentally ended up with two srcs in the same string here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Fixing it. Good catch.
) | ||
|
||
func makeCudaAdditionTestPod() *v1.Pod { | ||
podName := testPodNamePrefix + string(uuid.NewUUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason to prefer this over GenerateName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debugging....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point
return testPod | ||
} | ||
|
||
func isClusterRunningCOS(f *framework.Framework) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to skip the master here, since theoretically you could have a non-cos master with cos nodes, and the nodes are where you care about GPU (though I don't think this os image skew happens in practice today). One way to do this would be to skip unschedulable nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would unschedulable nodes have GPUs on them? Seems like an expensive unschedulable node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masters are marked unschedulable, so skipping unschedulables will skip masters. All I'm saying is that this test only cares that COS is on the nodes, and doesn't have to fail if the master is non-COS.
|
||
func areGPUsAvailableOnAllSchedulableNodes(f *framework.Framework) bool { | ||
framework.Logf("Getting list of Nodes from API server") | ||
nodeList, err := f.ClientSet.Core().Nodes().List(metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could pass nodeList into these helper functions, instead of re-listing on every call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually want to re-list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok
IMAGE = $(REGISTRY)/cuda-vector-add | ||
|
||
build: | ||
docker build --pull -t $(IMAGE):$(TAG) . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider specifying arch as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'd like to do it later since I don't know the multi-arch story for CUDA yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
cluster/gce/gci/nvidia-gpus/Makefile
Outdated
# limitations under the License. | ||
|
||
TAG=v0.1 | ||
REGISTRY=gcr.io/google_containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider ?=
for TAG
and REGISTRY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. good idea.
6c4372d
to
1b022fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments. PTAL.
ENV DEBIAN_FRONTEND noninteractive | ||
|
||
RUN apt-get -qq update | ||
RUN apt-get install -qq -y pciutils gcc g++ git make dpkg-dev bc module-init-tools curl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sweet.
cluster/gce/gci/nvidia-gpus/Makefile
Outdated
# limitations under the License. | ||
|
||
TAG=v0.1 | ||
REGISTRY=gcr.io/google_containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. good idea.
all: container | ||
|
||
container: | ||
docker build --pull -t ${REGISTRY}/${IMAGE}:${TAG} . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently only meant for COS, which is restricted to amd64 on GCP. Do you think more code is useful?
NVIDIA_DRIVER_PKG_NAME="NVIDIA-Linux-x86_64-375.26.run" | ||
|
||
check_nvidia_device() { | ||
lspci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can optimize it by wrapping around a command
, but I actually like verbose output.
|
||
echo "Running the Nvidia driver installer ..." | ||
if ! sh "${NVIDIA_DRIVER_PKG_NAME}" --kernel-source-path="${KERNEL_SRC_DIR}" --silent --accept-license --keep --log-file-name="${log_file_name}"; then | ||
echo "Nvidia installer failed, log below:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the command run by this script will be printed - set -x
. So the tail
command below will display the log file name.
test/e2e/generated/BUILD
Outdated
@@ -23,7 +23,7 @@ genrule( | |||
name = "bindata", | |||
srcs = [ | |||
"//examples:sources", | |||
"//test/images:sources", | |||
"//cluster/gce/gci/nvidia-gpus:sources//test/images:sources", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Fixing it. Good catch.
) | ||
|
||
func makeCudaAdditionTestPod() *v1.Pod { | ||
podName := testPodNamePrefix + string(uuid.NewUUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debugging....
return testPod | ||
} | ||
|
||
func isClusterRunningCOS(f *framework.Framework) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would unschedulable nodes have GPUs on them? Seems like an expensive unschedulable node.
|
||
func areGPUsAvailableOnAllSchedulableNodes(f *framework.Framework) bool { | ||
framework.Logf("Getting list of Nodes from API server") | ||
nodeList, err := f.ClientSet.Core().Nodes().List(metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually want to re-list.
IMAGE = $(REGISTRY)/cuda-vector-add | ||
|
||
build: | ||
docker build --pull -t $(IMAGE):$(TAG) . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'd like to do it later since I don't know the multi-arch story for CUDA yet.
76c18a9
to
1968f78
Compare
Changes to |
@k8s-bot unit test this |
@madhusudancs @nikhiljindal can either of you help me understand why the federation e2es are failing for this PR? |
@k8s-bot kops aws e2e test this |
@k8s-bot pull-kubernetes-federation-e2e-gce test this |
…Optimized OS Packaged the script as a docker container stored in gcr.io/google-containers A daemonset deployment is included to make it easy to consume the installer A cluster e2e has been added to test the installation daemonset along with verifying installation by using a sample CUDA application. Node e2e for GPUs updated to avoid running on nodes without GPU devices. Signed-off-by: Vishnu kannan <vishnuk@google.com>
Signed-off-by: Vishnu kannan <vishnuk@google.com>
e106bd6
to
86b5edb
Compare
Signed-off-by: Vishnu kannan <vishnuk@google.com>
@vishh please ignore federation tests for now. We are actively debugging the problem. It is not "required" right now, so PRs can merge without it passing. |
@madhusudancs Thanks for the tip. |
# Otherwise, we respect whatever is set by the user. | ||
MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-${GCI_VERSION}} | ||
MASTER_IMAGE_PROJECT=${KUBE_GCE_MASTER_PROJECT:-google-containers} | ||
DEFAULT_GCI_PROJECT=google-containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are changing the default version, why refer to google-containers
project in context of gci
here and elsewhere? why not just switch the projects to cos-cloud
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the defaults here too. Honestly I don't think it matters. All these hacks would hopefully disappear soon.
At this point, I want this PR to go in and then I can clean things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (one typo in a comment, but that's it)
# git checkout "tags/v${kernel_version_stripped}" | ||
git checkout ${LAKITU_KERNEL_SHA1} | ||
|
||
# Prepare kernel configu and source for modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/configu/config
MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-${GCI_VERSION}} | ||
MASTER_IMAGE_PROJECT=${KUBE_GCE_MASTER_PROJECT:-google-containers} | ||
DEFAULT_GCI_PROJECT=google-containers | ||
if [[ "${GCI_VERSION}" == "cos"* ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cos"*
is a neat trick, will have to add that to my toolbox.
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mtaufen, vishh Assign the PR to them by writing
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@vishh: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue |
@@ -49,21 +49,21 @@ images: | |||
tests: | |||
- 'resource tracking for 105 pods per node \[Benchmark\]' | |||
gci-resource1: | |||
image: gci-stable-56-9000-84-2 | |||
image: cos-beta-59-9460-20-0 | |||
project: google-containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Projects need to be updated to cos-cloud in this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opps. I thought I updated everywhere. Apologies.
} | ||
|
||
restart_kubelet() { | ||
echo "Sending SIGTERM to kubelet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to restart Kubelet during the installation? This kind of dependencies could be error-prone: deploying and managing this nvidia-gpus daemonset depends on Kubelet, but in the middle of running the daemonset (installer.sh), Kubelet is restarted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary for the kubelet to pick up the GPUs. Kubelet cannot support hotplugging GPUs for various reasons. We tried that with just PCI based data and it's proving to be hard.
Ideally, we need to reboot the entire node, but we haven't gotten there yet.
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150) Fix cos image project to cos-cloud. Addressed #45136 (comment). @vishh @yujuhong @dchen1107
Automatic merge from submit-queue (batch tested with PRs 41563, 45251, 46265, 46462, 46721) change kubemark image project to match new cos image project The old project is not available anymore. #45136
This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
/home/kubernetes/bin/nvidia/lib
for libraries/home/kubernetes/bin/nvidia/bin
for debug utilitiesContainers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using
HostPath
volumes.Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.
This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
3
PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.Try out this PR
export
KUBE_GCE_ZONE=KUBE_NODE_OS_DISTRIBUTION=gci
NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh
kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml
Another option is to run a e2e manually to try out this PR
KUBE_GCE_ZONE=<zone-with-gpus>
KUBE_NODE_OS_DISTRIBUTION=gciNODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"
go run hack/e2e.go -- --up
hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.
TODO: